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Abstract 

The input-constrained erasure channel with feedback is considered, where the binary input sequence 
contains no consecutive ones, i.e., it satisfies the (1, oo)-RLL constraint. We derive the capacity for this 
setting, which can be expressed as C e = max 0<p< i , where e is the erasure probability and 

///,(■) is the binary entropy function. Moreover, we prove that a-priori knowledge of the erasure at 
the encoder does not increase the feedback capacity. The feedback capacity was calculated using an 
equivalent dynamic programming (DP) formulation with an optimal average-reward that is equal to the 
capacity. Furthermore, we obtained an optimal encoding procedure from the solution of the DP, leading 
to a capacity-achieving, zero-error coding scheme for our setting. DP is thus shown to be a tool not 
only for solving optimization problems such as capacity calculation, but also for constructing optimal 
coding schemes. The derived capacity expression also serves as the only non-trivial upper bound known 
on the capacity of the input-constrained erasure channel without feedback, a problem that is still open. 

Index Terms 

Feedback capacity, constrained coding, dynamic programming, binary erasure channel, runlength- 
limited(RLL) constraints. 
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Fig. 1. System model for an input-constrained memoryless channel with perfect feedback. 


Memoryless channels have been the focus of research activity in information theory since 
they were introduced in 1948 by Shannon 0]. The capacity of a memory less channel has an 
elegant, single-letter expression, C = sup p / x ) I(X ; Y), and this can be calculated for a broad 
range of channels [O, 0. When considering a memory less channel with input that is constrained, 
the capacity is given by the maximum mutual information rate between the input and output 
sequences. The capacity calculation of such channels involves a calculation of the entropy rate 
of a Hidden Markov Model (HMM), since the transmission of a constrained sequence through 
a memoryless channel results in an output sequence that is described by an HMM. This makes 
the capacity of input-constrained memoryless channels difficult to compute ffl-lTTl. 

Constrained coding arises naturally in many communication and recording systems ||8l, ll9l; 
a common constraint that is useful in magnetic and optical recording is the (d, /;:)-runlcngth 
limited (RLL) constraint. A binary sequence satisfies this constraint if the number of zeros 
between any pair of successive ones is at least d and at most k. This constraint has also recently 
appeared in code designs for energy harvesting systems, where communication is used not only 
for information transfer but also for charging the receiver’s battery iflOl . In this paper, we focus 
on the special case of the (1, oo)-RLL constraint, in which no consecutive ones are allowed. 

It is well known that feedback does not increase the capacity of a memoryless channel, as 
shown by Shannon [fill . However, Shannon’s argument does not apply to memory less channels 
with constrained inputs, and special tools are required to determine the capacity of such channels 
with or without feedback. 

We consider an (1, oo)-RLL input-constrained binary erasure channel (BEC) with feedback, 
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Fig. 2. Erasure channel with erasure probability e. 


represented pictorially in Fig. |T] with the channel depicted in Fig. [2l Based on the message M and 
the previous channel outputs, y l ~ x , the encoder chooses the input X % , such that the input constraint 
is satisfied. The mechanism of the BEC is simple: each transmitted bit is transformed into an 
erasure symbol with probability e or received successfully with its complementary probability. 
The decoder estimates the message M with low probability of error as a function of the output 
sequence Y n . In this paper, we derive the explicit expression for the feedback capacity of the 
(1, oo)-RLL input-constrained BEC. 

The feedback capacity that is derived here also serves as an upper bound on the capacity of 
the (1, oo)-RLL input-constrained BEC without feedback, a problem that is still open. A lower 
bound on the capacity of the non-feedback setting was derived in llT2l by considering an input 
that is restricted to first-order Markov process (first-order capacity). The lower bound in llT2l and 
our feedback capacity are presented in Fig. [3] and it can be seen that maximal gap is attained at 
e = 0.71, where the first-order capacity is ~ 0.2354 while the feedback capacity is ~ 0.2547. 



Fig. 3. Lower and upper bounds on the capacity of the input-constrained BEC without feedback. 
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The relation between feedback-capacity calculation and dynamic programming (DP) first 
appeared in Tatikonda’s thesis [fl3l . Subsequent works included the formulation of capacity 
as DP for channels where the state is a function of the input Ifl4l . Markov channels IfTTil and 
power-constrained Gaussian noise channels with memory lfT6l . To apply algorithms from DP, 
such as value and policy iteration, quantization is required, and therefore, only lower bounds 
were derived in the above papers. 

In lfl7l and |[T8l . the feedback-capacities of the trapdoor and Ising channels, respectively, were 
found by solving their corresponding Bellman equations. The idea is that the feedback capacity 
is equal to the optimal reward of the DP, and therefore, it suffices to find a solution which 
satisfies the Bellman equation lfl9i . Besides reward optimality verification, the Bellman equation 
also establishes a mechanism for optimal policy verification, which is a significant additional 
benefit. 

The novelty in our work is the derivation of the optimal input distribution from the Bellman 
equation solution. The optimal solution of the DP is then utilized to understand how the 
dynamic program evolves under an optimal policy. We show that converting the DP solution into 
channel coding terms results in a straightforward interpretation of optimal encoding procedure. 
This encoding procedure led us to an innovative and zero-error coding scheme for our input- 
constrained setting. This establishes that DP as a tool is good not only for solving optimization 
problems, but also for deriving optimal coding schemes. 

We also consider an input-constrained BEC where the encoder knows ahead of time if there 
is an erasure in the channel. Clearly, this non-causal setting is superior in terms of capacity 
compared to the feedback setting. We have managed to show that the capacity of this setting 
coincides with our feedback capacity expression, and therefore, a priori knowledge of the erasure 
in the channel does not increase the feedback capacity. Although this finding and the coding 
scheme for the feedback setting are sufficient for the feedback-capacity derivation, we argue that 
the capacity-achieving coding scheme is hard to construct without the DP solution. 

The remainder of the paper is organized as follows. Section QTjincludes notation and description 
of the problem. Section UHI states the main results of this paper. In Section HV] we provide a 
brief review of infinite-horizon DP and present the DP formulation of the feedback capacity. 
In Section [V] the DP for the erasure channel is calculated, evaluated numerically and, finally, 
we prove that the Bellman equation is satisfied. In Section [V71 we present the derivation of the 
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optimal scheme from the solution of the DP. In Section lYIfl we derive the capacity of non-causal 
input-constrained BEC. Finally, the paper is concluded in Section IVIIII 

II. Notation and Problem Definition 

Throughout this paper, random variables will be denoted by upper-case letters, such as X, 
while realizations or specific values will be denoted by lower-case letters, e.g., x. Calligraphic 
letters will denote the alphabets of the random variables, e.g., X. Let X n denote the n-tuplc 
(Xl, ... ,X n ). For any scalar a £ [0,1], a stands for a = 1 — a. Let H b (a ) denote the binary 
entropy for scalar a £ [0,1], i.e., H b (a ) = — alog 2 a — dlog 2 o;. Let H ter (ai, a 2 , a 3 ) denote 
the ternary entropy for scalars ai,a 2 ,a 3 £ [0,1] such that JA ay — 1> i-e., H ter (oti, a 2 , a 3 ) = 
Y,i~ a i l0 §2 «*• 

The communication setting of a memory less channel with feedback is described in Fig. [7] A 
message M is drawn uniformly from the set {1,..., 2 nR } and made available to the encoder. The 
encoder at time i knows the message m and the feedback samples y % ~ ', and produces a binary 
output, Xi £ {0,1}, as a function of m and y l ~ l . The sequence of encoder outputs, x 1 x 2 x 3 ..., 
must satisfy the (l,oo)-RLL input-constraint of the channel, namely, no two consecutive ones 
are allowed. The channel is memoryless in the sense that the output at time i, given the existing 
information in the system, depends only on the current input, i.e., 

=p(yi\%i), Vi (i) 

We focus on the erasure channel, shown in Fig. [2] The input alphabet is X = {0,1}, while 
the output can take values in y = {0,1,?}. The probability for erasure in the channel is e and 
can take any value in [0,1]. 

Definition 1. A (n, 2 nR . (1, oo)) code for a constrained-input channel with feedback is defined 
by a set of encoding functions: 

/,:{l,..,2" ff }xr 1 47 j , i = l,...,n, 

satisfying /j(m,y* _1 ) = 0 if fi-i(m,y l ~ 2 ) = 1 for all (m,y* _1 ), and a decoding function: 

T : y n —► {1,..., 2 nR }. 

In addition, we define the non-causal (1, oo)-RLL BEC. For this setting, all definitions remain 
the same as in the previous setting, but the encoder knows ahead of time whether there is an 
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erasure in the channel. Formally, define 0* as the indicator that corresponds to erasure in the 
channel at time i, namely, 6, = 0 if x r = y, and 0, = 1 otherwise. The set of encoding functions 
for this setup is then defined as: 

/» : {2 ni? } X y- 1 x {0, 1} -A Xi, i = 1,..., n, 

satisfying ffm, j/ <_1 , 00 = 0 if 0;-i) = 1 for all 0i_i, 00. 

The average probability of error for a code is defined as pj™' 1 = Pr(M f T (>''')). A rate 
R is said to be (1, oo) -achievable if there exists a sequence of (n, 2 nR , (1, oo)) codes, such that 
lim n ^oo Pe' l) = 0. The capacity, Cf, defined to be the supremum over all (1, oo)- achievable 
rates, is a function of the erasure probability e. Let Cf denote the capacity for the non-causal 
(1, oo)-RLL BEC. From operational considerations of the encoding functions for both settings, 
it is clear that Cf > Cf. 


III. Main Results 

The following is our main result concerning the capacity of the (1, oo)-RLL constrained BEC 
with feedback. 


Theorem 1 . The capacity of the (1, oo )-RLL input-constrained erasure channel with feedback 
is 


Cf = 


max 


H b (p) 


( 2 ) 


o<p<§ P + — 

Furthermore, the capacity is achieved by an explicit zero-error coding scheme that is presented 
in Section \ VI-B\ in Algorithm [7] and Algorithm \ 2] 


In Fig. |H the feedback capacity is evaluated for different values of erasure probability e. As 
can be seen, the capacity is a decreasing function for an increasing value of e. For e = 0, the 
capacity is Cq 1 « 0.6942, which can be represented as log 2 0, where 0 is the golden ratio and is 
known as the entropy rate of a binary source with no consecutive ones. For e = 1, the capacity 
value is Cf = 0, as expected. 

The capacity of the non-constrained BEC can be expressed as max 0<p< i Fffl = 1 — e . Note 
that the only difference between this term and our capacity expression in ([2]) is the denominator. 
This fact hints that the capacity expressions of other input constraints may share a common 


structure. 
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Fig. 4. The capacity Cf\ as a function of e, of the (1, oo)-RLL input-constrained BEC with feedback. 


The next theorem states that the non-causal (1, oc)-RLL input-constrained BEC has the same 
capacity as the feedback setting. 

Theorem 2. Non-causal knowledge of erasures does not increase the feedback capacity, i.e., 


Cl 



Next, we show the properties of the capacity expression ©. 


Lemma 1. Define the function ffp) 
for f e (p): 


JMpL 

p+Ci 


, where p G [0,1]. The following properties hold 


• The function f e (p) is concave on [0,1], for any e > 0. 

• The function f (p) has only one maximum in [0,1], which is the only real solution of the 
equation pi = (1 — p) 1+ T This maximum lies in [0, |]. 

• Denote by p e the argument that achieves the maximum of f e (p). The capacity can also be 
expressed by, 


Cf 


-log 2 (Pe) 


The proof of Lemma [Q is presented in Appendix [A] 





IV. Feedback Capacity and Dynamic Programming 


The normalized, directed information was introduced by Massey in [20] as 3/( X n —> Y n ) = 
n SiLi Yi\ V* -1 ). Massey showed that the maximum normalized directed information 
upper bounds the capacity of channels with feedback, and subsequently, it was proved that 
this expression indeed characterizes the feedback capacity for a broad class of channels ffl5ll . 
|[2TI - l!24l . Of most relevance to our work is the feedback capacity of the unifilar finite state 
channel that was characterized in IfTTII . The next theorem follows from Theorem 1 in [|T71 . by 
substituting S t ~ 1 = AVi as the channel state at time t. 

Theorem 3 (Theorem 1, lUTIO . The capacity of an (l,oo )-RLL input-constrained memoryless 
channel with feedback can be written as: 



(3) 


t= i 


where the supremum is taken with respect to {p(x t \x t -i, y t 1 ) : p(x t = l\x t -i = 1, y t *) = 0} t >i. 

Having written the capacity of the input constrained channel with feedback as ([3]), we proceed 
to show that calculating the capacity can be formulated as an average-reward DP. 

A. Average-Reward Dynamic Programs 

Each DP is defined by the tuple (Z,U, W, F, P z , P w ,g). We consider a discrete-time dynamic 
system evolving according to: 


z t = F(z t -i,u t , wf), t = 1,2,... 


(4) 


Each state, z t , takes values in a Borel space Z, each action, u t , takes values in a compact subset 
U of a Borel space, and each disturbance, w t , takes values in a measurable space W. The initial 
state, /~ 0 , is drawn from the distribution If, and the disturbance, w t , is drawn from P w {-\z t -\, u t ). 
The history, h t = (zq, tut,... ,w t -i), summarizes all the information available to the controller 
at time t. The controller at time t chooses the action, u t , by a function p t that maps histories to 
actions, i.e., u t = Pt(ht)- The collection of these functions is called a policy and is denoted as 
7r = {pi, pz,... }. Note that given a policy, tt, and the history, h t , one can compute the actions 
vector, if, and the states of the system, Zi, z 2 ,..., z t - 1 - 
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Our objective is to maximize the average reward given a bounded reward function g : ZxU —>■ 
M. The average reward for a given policy it is given by: 

' N 

^2g(z t -i,fk(ht)) , 

.4=1 

where the subscript it indicates that actions u t are generated by the policy ^ r. The optimal average 
reward is defined as 


p n = lim inf — E„ 

N^-oo iV 


p = sup/v 

7T 


/>'. Formulation of the feedback capacity as DP 

The state of the dynamic programming, z t -\, is defined as the conditioned probability vector 
/3 t -i(x t _i) = p(x t -i\y t ~ 1 ). The action space, U, is the set of stochastic matrices, p(x t \x t -i), 
satisfying the (1, oc)-RLL constraint. For a given policy and an initial state, the encoder at time 
t — 1 can calculate the state, [f-\ (xt -\), since the tuple y 1 ' 1 is available from the feedback. The 
disturbance is taken to be the channel output, w t = Ut, and the reward gained at time t — 1 is 
chosen as I(Y t ] X t: Xt-fy^ 1 ). The formulation is summarized in Table [0 

Existence of System: We need to show that for a given policy, it = {// ( . p 2 , ... }, the state 
z t can be calculated from the tuple (z t -i,u t ,yt). Consider, 

Pt(xt) = p{x t \y t ) 

= Xt-fy 1 ) 

Xt-1 

piytly*- 1 ) 

_ Ex,-! p{xt-i\y t ~ l )p{xt\xt-i,y t ^ 1 )p{yt\y t ~ 1 ,x tl x t -i) 

(a) Ext-, p(xt-i\y t ~ 1 )p{x t \x t - ll y t - l )p(y t \x t ) 
Ex t ,xt-,p( x t-Ay t ~ 1 )p( x t\ x t-i,y t ~ 1 )p{yt\ x t) 

Ext-iPt-i{.x t -i)u t {x u x t ^)p{y t \x t ) ^ 

Ex t ,xt-i Pt-i(x t -i)u t (x t , xt-i)p(yt\ x t) ’ 

where (a) follows from the memoryless property (Q]). Therefore, there exists a function F, such 

that /3 t (x t ) = 
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TABLE I 

Formulation of capacity as DP 


Input-constrained memoryless channel 

Dynamic Programming 

P(xt- i|3/ t 1 ) 

Zt - i , state at time t — 1 

Constrained p(xt|a;t i) 

Ut , action taken at time t — 1 

Vt 

Wt, disturbance generated at time t 

Equation (j5j 

Zt = F(zt-i,Ut,Wt), system equation 

I{Yt\Xt,X t - rly*- 1 ) 

g(zt-i,Ut), reward gained at time t — 1 


Disturbance: Let us show that the disturbance distribution depends on the current state and 
action only, with no dependence on past information, i.e., p(w t \w t ~ 1 , z^ 1 , u*) = p(w t \z t -i, u t ). 

p(w t \w t ~ 1 , z t ~ 1 , u*) = pivtly^ 1 , / 3 * -1 , U l ) 

Xt,Xt -1 

= u^piytlxt, X t - 1 , /3 t_1 , u\ y * _1 ) 

xt,x t -1 

Xt,X t -l 

= p(yt,xt,x t -i\f3 t -i,u t ) 

XtyXt -1 

= p(s/t|A-i,«t) 


where (a) follows from the fact that the value of p(xt_i|t/ t_1 , /3 t_1 , u*) is determined by /3 t _i, 
the fact that x t depends only on the triplet (x t -i, (3t-i, u t ), and finally, the fact that the channel 
is memory less. 

Reward: We need to show that the reward, I(Y t ; X t , A",_i ?/~' ), that is achieved at time t — 1 
is a function of the current state, /3 t -i(x t -i), and of the chosen action u t . Note that the term of 
the reward depends on the conditional distribution p(yt, x t , ay_i|?/ _1 ) only. 

For an initial state z 0 and a given policy 7r = {// 1 , // 2 ,... }, the term /3 t _i is determined by 
y f ~ l . Let us show that the reward achieved at time t — 1 depends on the current state, action 
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and the channel characterization, 


p(yt,x t ,x t - i|j/* : ) = pixt-xly* 1 )p(x t \x t ^ 1 ,y t l )p{y t \x t ) 
= f3t-i(x t - 1 )u t (xt,x t -i)p(y t \x t ), 


where (a) follows from the chain rule and the memoryless property dH). Recall that the term 
p(y t \x t ) is given by the channel characterization, and thus, the reward depends on the state, (3 t - 1 , 
and the chosen action, u t . Therefore, the reward at time t — 1 can be written as: 

yi.zt—h'U't) -^(^t; Xt , i \/3f~ i, Ut ) • 

It then follows that the optimal average reward of the DP is: 



where the subscript tt indicates that the mutual information is calculated with respect to the 
policy tt. This term is the capacity for an input-constrained memoryless channel with feedback 
as presented in Theorem [3] and we conclude that the optimal average reward is equal to the 
capacity. 


V. Solution For the Erasure Channel 


This section is organized as follows: Section IV-AI formulates feedback capacity of the BEC 
as DP using the notation from Section IIV-BI In Section IV-B1 we evaluate a numerical solution 
using the value iteration algorithm, and finally, in Section IV-C1 we present the Bellman equation 
and its solution for the BEC. The solution of the Bellman equation concludes the derivation of 
the feedback capacity expression in Theorem. [Q 

A. Formulation of the erasure channel as DP 

The state of the DP at time t — 1, z t - 1 , is the probability vector \p(x t -1 = 0| y 1-1 ), p(x t _i = 
l|y t_1 )]. With some abuse of notation, we refer from now on to z t -\ = p(x t -\ = 0|?/ -1 ) as the 
first component of the vector, which also determines the second component, since they sum to 
1. Each action, u t , is a constrained 2x2 stochastic matrix, p(x t \x t -i), of the form: 


p(x t = 0|ay_i = 0) p(x t = l\x t -i = 0) 


u t = 


1 


0 
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The disturbance w t is the channel output, y t , and can take values in {0,1, ?}. With the above 
definitions and ©, the system equation can be expressed as follows: 


= 


1 if w t = 0, 

1 - z t -i + z t -iu t (l, 1) if w t =?, 
0 if w t = 1. 


( 6 ) 


At this point, to simplify notations we note that 1 — z t -\ + z t -\U t {l, 1 ) can be written as 
1 — z t -iu t (l,2) . We denote 5 t — z t -iu t (l,2), and this implies the constraint 0 < S t < z t ~i, 
since u t , by definition, must be a stochastic matrix. Furthermore, when investigating the relation 
of DP and encoding procedures, u t has to be recovered from 8 t , given z t -\. This calculation 
is trivial for z t -\ ^ 0, while for z t _\ = 0, we note that u,,(l. 2) has no effect on the DP, and 
therefore, u t ( 1,2) can be fixed to zero. 

To calculate the reward, the conditional distribution p(x t , x t -i, yi\z t - 1 . u t ) is described in Table 
nn and it follows that the reward is: 


g(zt-i,u t ) = I(Yt,Xt,X t -i\zt-i,u t ) 

= H(Y t \z t -i,Ut) - HiYtlX^Xt-uZt-^Ut) 

^ H ter ((l-5t)e,e,8 t e)-H b (e) 

( = } H b {e) + eH b (8 t ) - H b (e) 

= eH b (5 t ), 

where (a) follows from the marginal distribution p(y t \z t -i,u t ) in Table HU and the definition of 
5 t , while ( b ) follows from an easily verifiable identity: Ff ter (a6, ab, b) = H h {b) + bll h (a), for all 
a, b G [0,1]. 


TABLE II 

THE CONDITIONAL DISTRIBUTION p ( x t , X t -i ,Vt\z t -l, Wt) 


Xt 

Xt-1 

O 

II 

yt =? 

Vt = 1 

0 

0 

«t—iw t (l, l)e 

Zt-iu t {l, l)e 

0 

1 

0 

0 

2)e 

2t iUt(l, 2)e 

0 

1 

(1 — zt- i)e 

(1 — Zt-l)t 

0 
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To apply the value iteration in the next subsection, it is convenient to define the operator of 
the DP: 


(Th)(z) = sup g(z,u) 

u£U 


Pw(dw\z, u)h(F(z, u, w)), 


for all functions h : Z —> M. 

For our case, the operator of the DP takes the form of 


(V) 


(Th e )(z) = sup eH b (5) + (1 — S)eh e (l) + eh e (l - 5) + Seh € (0), (8) 

0<S<z 

for all h e : [0,1] —> R, where the subscript e indicates that h e depends on the parameter e. 


B. Numerical evaluation 

Now, that we have the DP formulation for our problem, we can apply the value iteration 
algorithm to estimate the optimal average reward. The value iteration algorithm is simply applying 
the DP operator from ([8]) successively, and it has the form hk(z) = (Thk~i)(z) with h 0 (z) = 0. 
The state of the DP and the values in the action matrices are continuous, which cannot be 
implemented by a finite-precision computer. To this end, a quantization of 5000 points in the 
unit interval for both z t and 8 t was performed, and the results after 20 iterations are presented 
in Fig. [5] for erasure probability e = 0.5. 


Action 5 2 o 



State z 


Value function h 2 o 



Fig. 5. Value iteration evaluation for the erasure channel with e = 0.5. The algorithm was implemented with 20 iterations and 
quantization of 5000 points for both action and state. 
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We also simulated the system with the estimated optimal action <5 2 o- The initial state, z 0 , was 
chosen to be zero and the action was taken according to <5 20 which led to a gained reward. The 
disturbance was generated randomly according to the induced distribution from Table QH Having 
in hand the current state, action and disturbance, the new state was calculated and the process 
was repeated 10 6 times. This simulation led to an approximate average reward of 0.4056 and the 
histogram of the states is shown in Fig. [6] The significant importance of a discrete histogram 
will be discussed in Section [VH where it is explained how the DP simulation leads us to derive 
an optimal coding scheme for our channel setting. 



Fig. 6. Histogram of system states after 10 6 runs. 


C. The Bellman Equation 

In dynamic programming, the Bellman equation suggests a sufficient condition for average 
reward optimality. This equation establishes a mechanism for verifying that a given average 
reward is optimal. The next result encapsulates the Bellman equation and can be found in ll25Tl . 


Theorem 4 (Theorem 6.1, ll25ln . If p E M and a bounded function h : Z —> M satisfies for all 
z G Z: 


p + h(z) = sup g(z, u ) + 

uCU 


Pw(dw\z, u)h(F(z, u, w)), 


(9) 


then p = p*. Furthermore, if there is a function p : Z —t U such that pfz) attains the supremum 
for each z, then p n = p* for i r = {po, pi,...} with pt(h t ) = p(z t - 1 ) for each t. 
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For our DP, substituting ([8]) into © yields the next Bellman equation: 


h e (z) + Pe = sup eH b (5 ) + e(l - 5)h e ( 1) + eh e ( 1 - 5) + eSh e ( 0), 


( 10 ) 


0<5<z 


for all functions h e . Let us denote two constants p* and p e . 



H b (p) 


(ID 


p £ = arg max--j—, 

0<p<| P + — £ 


and a bounded function, 



eH b (z ) - ze^4- if 0 < z < p £ 


H b (p e ) 
„ T l 


if p £ < z < 1. 


( 12 ) 


We proceed to show the DP solution by solving (flOl) . 

Theorem 5. The constant p* and the function h* (z) given in ( TiTI ) and (IT2l) . respectively, satisfy 
the Bellman equation (flOl) for each e. Therefore, p* is the optimal average reward. 

As p* is equal to the capacity expression I©, Theorem [5] concludes the proof for the first part 
of Theorem HJ The proof of Theorem [5] is presented in Appendix |B} 

VI. Derivation of the capacity-achieving coding scheme from the DP solution 

In this section, we derive the optimal coding scheme using the DP solution and finally show 
that this leads to a capacity-achieving coding scheme. The method comprises recovering the 
optimal constrained input distributions {p{x t \x t -\, p t_1 )}t>i from the solution of the DP. 

A. Relation of the Coding Scheme to Dynamic Programming Results 

The histogram for e = 0.5, in Fig. [6] shows that under an optimal policy, 5*, the system 
evolves between three steady states. Moreover, the solution of the Bellman equation indicates 
that there exists an optimal stationary policy, and therefore, we look at the stationary phase of 
the DP. The states, 0 , take values in the finite set (0,1 — p, 1}, with p = p e (Eq. (fTTI)f: the 
subscript e is omitted for convenience, but all details are discussed for a fixed e £ [0,1] and its 
corresponding p e . For each state, the optimal policy, 5*, is known from the Bellman equation 
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Fig. 7. State diagram of the DP for the input-constrained BEC under an optimal policy. 


and arrows can be drawn between the states as a function of the disturbance. The state diagram 
for our DP is presented in Fig. [7] 

Converting the state diagram in Fig. [7j into channel coding terms, using the formulation 
described in Table HI results in an encoding procedure as described in Fig. [8] Specifically, the 
states, p(x t - 1 = 0|f/ -1 ), take values from {0,1 — p, 1}. Each state has its corresponding action, 
p(x t = l\x t -i = 0), and the encoding procedure evolves as a function of the output y t . Recall 
that p(x t = 0|x t _i = 1) = 1, and therefore, the action p(x t = l\x t -i = 0) is sufficient to 
determine the transfer matrix between X t _i and X t . 

Let us explain how the encoding procedure evolves. We refer to the state p(x t - 1 = ) = 1 

as the ground state , since this indicates that 'O' was received at the decoder and, therefore, the 
encoder is allowed to transmit any input to the channel. For the ground state, the next transmitted 
bit is distributed according to Ber(p) and it is shown to be the optimal action. 

Upon receiving y t = 0 at the decoder, the system remains at the ground state and the encoding 
procedure starts over again. When the output is y t = 1, the system moves to the state p(x t -1 = 
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p(x t = l\x t -i = 0) = 0 p(x t = l\x t -i = 0) = p 



Fig. 8. Optimal encoding procedure for the input-constrained BEC. This encoding procedure was achieved from Fig. [7] by 
converting states, actions and disturbances into their corresponding channel coding terms. 


0|?/ _1 ) = 0. At this state, since the last input was necessarily '1', the encoder is forced to 
transmit 'O'. Therefore, the decoder knows that 'O' is the only legitimate input, and the system 
returns to the ground state regardless of whether the input was erased or not. 

The remaining scenario to examine begins at the ground state and is followed by y t =?. The 
optimal action at the lower state, p(x t - 1 = 0|y t_1 ) = 1 —p, suggests that if 'O' is erased, the new 
transmitted bit should be distributed according to Ber(Y 3 ^). The term is in the unit interval, 
since p < Additionally, the input constraint implies that if '1' was erased then 'O' should be 
transmitted. Upon consecutive erasures, the encoder continues to transmit bits according to this 
policy. When an output is not an erasure, the system returns to the ground state, and this might 
take one or two time instances, depending on whether the (unerased) output bit is '0' or '1'. 

The main challenge is to understand how this encoding procedure can be interpreted as 
transmitting a message by the encoder. Let the messages be points in the unit interval, i.e., 
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Fig. 9. Example for transmitting the black-dot message using the encoding procedure in Fig. [8]for 3 time instances. The initial 
partition at the ground state is according to p, and the encoder transmits 'O' since the black-dot message falls within [0,p). 
Upon a successful transmission, the encoder moves back to the ground state and a new procedure begins. In case of erasure, 
we move to t = 2, and the interval that was labelled 'O' is partitioned according to q = The input constraint is preserved 
since the interval [p, 1), that was labelled '1', is now flipped to '0'. The encoder transmits '1' since the message falls within 
[pq, p). In case of another erasure, a partition of q should be performed for the intervals that are labelled 'O'. These intervals 
are [0 ,pq) and [p, 1), which are sum up to 1 — p. Since q = we simply change the label of [p, 1) (which has length of 
p) to 'l 1 , and the label of [0,pq) remains , 0 / . The input-constraint is preserved since [pq,p) is re-labelled as 'O'. Upon another 
erasures, the labelling will be exchanged between the ones presented in t = 2 and t = 3 until a successful transmission. Note 
that the labelling at t = 1 and t = 3 are essentially the same. 


messages take values in the set M. = { }T=o~ ' • At each time instance, the unit interval 
contains sub-intervals with labels that can be 'O' or '1', and the input to the channel is simply 
the label of the sub-interval containing the message. Such an association of messages into a 
specified interval has been done before in Il26l - ll29l . 

The partition into sub-intervals will be according to parameters p and q = as described 
in Fig. [8} When performing a partition at the ground state, the lower interval is labelled '0' while 
the upper interval is labelled '1'. Before providing the precise encoding algorithm, it will be 
convenient to understand the labelling process in the example described in Fig. [9j 

As can be seen in Fig. [9j all the proposed partitions in Fig. [8] can be encapsulated into two 
possible labellings. We denote the labelling at t — 1 as L\, and the labelling at t = 2 as L 2 . 
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The initial labelling at the ground state is chosen as and upon erasure, the current labelling 
will be replaced with the other labelling. Note that changing the labelling L t with Lj for i f j 
preserves the input constraint and can be done simply by exchanging the labels of \pq,p) and 
[p, 1), while the label of [0 ,pq) remains 'O'. 

To summarize at this point, at each time instant, we have two possible labellings (which 
depend on the value of e) of the unit interval which define uniquely the mapping from messages 
to the channel input. The current labelling is determined only by the output tuple, y t_1 , and 
therefore, the decoder and encoder both agree on the latter. 


B. Capacity-achieving Coding Scheme 

At time instance t — 1, the set of possible messages is defined as M t ~i = {m G M : 
p(m||/ -1 ) > 0}, with M.q = A4. The conditional distribution p(m|?/ t_1 ) is calculated using 
Bayes’ rule, using the fact that the encoding procedure and both labellings are revealed to all 
parties before transmission begins. Note that the set of possible messages can also be calculated 
at the encoder, since the output tuple, y t_1 , is available from the feedback. 

Any received symbol at the decoder might reduce the set of potential messages, and a 
successful transmission is defined as a transmission where the size of the set of possible messages 
is changed, namely, \M t \ < Specifically, a successful transmission can occur in one 

of two scenarios; the first is y t = 1, and the second is where y t — 0 and y t _i 1. Upon 
a successful transmission, the set of possible messages is calculated and expanded uniformly 
to the unit interval. To be precise, the messages in the set M t take values in {p^}[’=o '• 
This transmission procedure continues repeatedly until the set of possible messages contains one 
message. The detailed encoding and decoding procedures are described in Algorithms Q] and [2] 

Rate Analysis: The main feature of this coding scheme is that the length of the sub-interval 
that is labelled by 'V is p. This property is recorded as Lemma [2j 

Lemma 2. At any step of the message transmission process, the lengths of the sub-intervals that 
are labelled by 'V sum up to p. 

Proof: Throughout transmission, there are two possible labellings; for L lt the interval [p, 1) 
that is labelled 'V has length of p, while for L 2 , the interval [pq,p) has length of pq = p. ■ 
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Algorithm 1 Encoding Procedure 

while Set of possible messages contains more than one message do 
Label the unit interval according to L x . 

Transmit the label of the sub-interval containing the message, 
while Received symbol is an erasure do 
Exchange the labels of [pq,p) and [p, 1). 

Transmit the label of the sub-interval containing the message, 
end while 

if Received symbol is 'O' then 

Denote the messages within sub-intervals which are labelled 'O' as the set of possible 
messages. 

else 

Denote the messages within sub-intervals which are labelled 'V as the set of possible 
messages 

Transmit 'O'. 

end if 

Expand the set of possible messages to the unit interval. 

end while 


From Lemma [2l we note that the encoder transmits 'V if message falls within sub-interval that 
has length of p. However, the messages are discrete points and a partition might fall between 
two messages. This implies that the transmitted bit is distributed as Ber(p + e*), where e, ; is a 
correction factor. In Appendix O it is shown that the correction factor has a negligible effect 
on the rate of the coding scheme. To simplify the derivations here, with some loss of accuracy, 
we say that each transmitted bit is distributed according to Ber(p). 

In the next lemma, we show that each successful transmission reduces the expected number 
of bits that is required to describe the set of possible messages by Hb(p). 

Lemma 3. With each successful transmission, the expected number of bits that describe the set 
of possible messages is reduced by Hb(p). 
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Algorithm 2 Decoding Procedure 

while Set of possible messages contains more than one message do 
Label the unit interval according to L\. 
while Received symbol is an erasure do 
Exchange the labels of \pq,p) and [p, 1). 

end while 

if Received symbol is 'O' then 

Denote the messages within sub-intervals which are labelled 'O' as the set of possible 
messages. 

else 

Denote the messages within sub-intervals which are labelled '1' as the set of possible 
messages. 

Ignore the next received symbol. 

end if 

Expand the set of possible messages to the unit interval. 

end while 


Proof: Assume that the set of possible messages is of size k; upon a successful transmission, 
if 'O' is received then the new set of possible messages has size pk, and if '1' is received then its 
new size is pk. The expected number of bits that is required to describe the new set of possible 
messages is plog 2 (pk ) + plog 2 (pk) = log 2 k — H b (p). ■ 

The next step is to calculate the expected number of channel uses for a complete procedure. 
We define a complete procedure to consist of all transmissions by the encoder starting at some 
time t at which it is in the ground state, and ending at the first time t' > t at which it returns to 
the ground state. In other words, a procedure is completed when a 'O' or '1' is received at the 
decoder, including one extra channel use in the case when a '1' has been received and has to be 
followed by 'O'. 


Let N be a random variable corresponding to the number of channel uses within a complete 
procedure. The expected value of N will be calculated by the law of total expectation. Define 
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an indicator function 


e = 


and consider, 


0 if the received bit is 'O' 
1 if the received bit is 'V , 

E [N] = E[E[7V|0]] 

1 


®E[ 
(c) 


1 — e 
1 

+ P, 


+ 0 ] 


1 — e 

where (a) follows from the law of total expectation, (6) follows from the fact that channel is 
memoryless and, therefore, is the expected value of time to receive a symbol which is not 
an erasure, and (c) follows from E[0] = Pr(0 = 1). 

Finally, we prove the second part of Theorem HI specifically, the rate of this coding scheme 
can be arbitrary close to the capacity expression, Cf. 

Proof: It follows from the law of large numbers that the rate of our coding scheme can be 
arbitrarily close to the expected number of received bits within a complete procedure divided 
by the expected number of channel uses within a complete procedure. In Lemma |3] we showed 
that within a successful transmission, the expected number of received bits is Hb(p). Moreover, 
the expected number of channel uses within a complete procedure is E [N] = +p. Therefore, 

the rate of the code can be arbitrarily close to R — ■ 

The above proof and Theorem 0 conclude the proof of our main result Theorem |U 


VII. Non-causal Capacity 

In this section, we prove Theorem [2] by showing that Cf = max 0< <i Hb ^ . Operational 

2 P+i_ e 

considerations of non-causal and feedback capacities reveal the trivial inequality C nc > C fb_ 
Furthermore, we derive in this section an upper-bound on C " c , which is equal to Cf 1 , and this 
concludes the proof of Theorem [2] with 6'" c = Cf. 

The next lemma shows that it is sufficient to consider encoders which transmit 'O' if erasure 


occurs, i.e., Xi = 0 if 6i — 1. The intuition behind this lemma is that replacing erased ones with 
zeros does not effect the output sequence, while the input-constraint is not violated. 
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Lemma 4. For any (1. 2 nR , (1, oo)) code C with probability of error Pe, there exists a 
(n, 2 nR , (1, oo)) code C with probability of error Pe n \ satisfying 


fi(m, y l 1 ,B i = i) = 0,i = l,...,n V(m, y l x ). 


Proof: For any (1, 2 nR , (1, oo)) code C consisting of encoding functions, {/,(-)}” =1 , and 


a decoding function T (•) with probability of error Pe n \ define a new sequence of encoding 
functions as follows: 



(13) 


for all ( m,y 1 ') and z = 1,.... n. We argue that {/,;(•)}” =1 and the original decoding function 
'F(-) determine a new code with the same probability of error p!"'. First, the set of encoding 


functions, {/*(•)}" =1 , satisfies the input constraint, since we replaced ones with zeros. Further, 
the output sequence is not affected by our modification, since we replaced only bits that are 


(n) 

erased, and therefore, our new code also has probability of error P e . 


We introduce (1, oo, Ber(e))-RLL encoder , which outputs sequences X n that satisfies two 
constraints: 

1) The (1, oo)-RLL constraint. 

2) Xi — 0 if 0 t — 1 (the constraint induced by Lemma 01). 

The second constraint can be viewed as a ’’random constraint” since ~ Ber(e), while the first 
constraint is a deterministic constraint. Thus, the (1, oo, Ber(e))-RLL encoder combines both 
deterministic and random constraints. 

The entropy rate of (1, oo, Ber(e))-RLL encoder is measured by ffJi=\ H(X i \X' l ~ 1 , 6 l ) 

since this is the available information at the encoder. The next lemma provides an upper bound 
on the entropy rate of sequences that can be generated by a (1, oo, Ber(e))-RLL encoder. 

Lemma 5. The entropy rate of sequences that are generated by a (1, oc. Ber{e))-RLL encoder 
is upper bounded by max 0<p< i ^ i . 

Proof: Recall that the encoder can choose its output bit, Xi, only if x^i — 0 t — 0; we 
parameterize this by p(xi = = 0,6( = 0) = p , where p G [0,1]. Now, consider the 
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transition probability matrix of the chain X n , 


Q = 


e + ep ep 
1 0 


where the transition probability e + ep was calculated by 

p(xi = 0|xj_i = 0) = ^p^Xi = 0,6*j |xj_i = 0). 

0i 

The stationary distribution of this chain is [x*(0) Z*(l)] = [yy^ yy^]. 
Consider the next upper bound for some i, 

H(Xi\X l -\9 l ) < H(Xi\Xi_i, 6i) 

= H(Xi\X i . 1 ,9 i = 0)e 
= H(Xi\xi-i = 0 ,9i = 0 )p(xi-i = 0| 9i = 0)e 
— H b (p)p( x i-i = 0)e 


(14) 


where (a) follows conditioning reduces entropy, ( b ) follows from II(X t \X,_ ,. 9, = 1) = 0, (c) 
follows from iT(Xj|xj_i = 1, 0, = 0) = 0, and id) follows from the fact that X L _ } is independent 
of 6i and substituting the parameter p. 

By substituting the stationary distribution i = 0) = x*(0) into (fl4l) . we see that the 
entropy rate of the chain is upper bounded by , for some p G [0,1]. This term can also be 
written as , and the parameter p need be maximized only on [0, 0.5] from Lemma [Q ■ 
The rate of the message M is upper bounded by the entropy rate of sequences that can be 
generated by a (1, oo, Ber(e))-RLL encoder, and this concludes the proof of Theorem [2] with 


cr < 


max 


H b (p) 


0<P<| P + — e 




VIII. Conclusions 

We considered the setup of an input-constrained erasure channel with feedback and found its 
capacity using equivalent DP. We then pursued the complementary derivation of a simple and 
error-free capacity-achieving coding scheme, which we found using the strong relation between 
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optimal policies in DP and encoding procedures in channel coding. Moreover, we have shown 
that the capacity remains the same even if the erasure is known non-causally to the encoder. 

Following the theorem that feedback does not increase the capacity of a memoryless channel 
HD, Shannon also argued that this theorem can be extended to channels with memory if the 
channel state can be computed at the encoder. Our system setting falls into this criteria, since the 
previous input of the channel can be thought of as the channel state. The proof for Shannon’s 
argument was omitted, although not trivial, and still stands as a conjecture. 

Following Shannon’s conjecture, it could be interesting to derive the capacity of the input- 
constrained erasure channel with delayed feedback, namely, when the input to the channel at time 
i depends on the message and the tuple Y l ~ u , where v is the delay of the feedback. Dynamic 
programming formulation for the delayed-feedback capacity is feasible and could shed light 
on Shannon’s conjecture and on the capacity of the input-constrained erasure channel without 
feedback. Furthermore, a model with arbitrary delayed feedback will provide a new upper bound 
for the capacity of the input-constrained BEC without feedback, a problem that is wide open. 

Appendix A 
Proof of Lemma 


Proof of Lemma [7} 

• A sufficient condition for the concavity of a function f(p ) is that the second derivative is 
negative for any value of p. We denote k = and find a condition on k such that the 
second derivative is negative. To simplify the derivations, we take //;,(■) to be the binary 
entropy with the natural logarithm base, since multiplication by a constant does not effect 
concavity. Calculation shows that 


d 2 

(H b (p)\ 

t(t%- 2kln v7) 

| — 2 ln(l — p) 

dp 2 

\p + k ) 

1 p 3 


(15) 


It suffices to examine the sign of the numerator, since p 3 > 0. Define g{p) = ~ 

2fcln « 21n(l — p). Derivation of the maximum for g(p) shows that it has only one 

maximum, which is at p = |. Substituting g(|) = —4(| + k) 2 + 2 In2. It then follows that 
g(p) < 0, Vp G [0,1] if and only if k > In 2 — | ~ 0.088. 
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• Derivation of the first derivative of f(p) shows that the derivative is equal to zero if and 
only if pi = (1 — p) 1+ ^ holds. The uniqueness of the maximum point follows from the fact 
that pi increases as p grows, while (1 — p) 1+ r decreases with a growing p. 

Now, assume that the maximum is p m £ (|, 1]. Symmetry of the binary entropy function 
implies H b (p m ) = H b (p m ), and therefore, it is sufficient to examine the denominator. Since 
both arguments p m , p rn £ [0,1], it then follows that f(p m ) < f{p m ), which is a contradiction. 

• This property follows from substituting the relation pi = (1 — p) 1+ r into the function f(p). 


Appendix B 
Proof of Theorem [5] 

The next lemma is technical and will be useful in the proof of Theorem [5j 
Lemma 6. The function f £ (z) = ell b (z) — ze is concave on [0,1] and its maximum is at 


z = p e , where p £ = argmax 0 < p <i 


Proof of Lemma® The concavity of f £ {z) on 2 E [0,1] follows from the concavity of the 
binary entropy function, and therefore, it suffices to show that the first derivative of f £ (z) at p £ 


is equal to zero. The definition of p £ , (fTTI) . and Lemma Q] imply the relation, jz 
which is equivalent to 


H b (z) 

T+T 


z=p t 


H 'b(Pe)(Pe + -) - H b {p e ) = 0. 


(16) 


The first derivative of f € (z) at the point p e is: 


d 

dz 


eH b (z) - ze 


H b (Pe ) 

Pe + 7 


Z=p e 


=r ,. = eHJz) - e 


H b (Pe ) 


Z=p e 


Pe+ 

eH' b (z)(p e + f) - eH b (p £ ) 

Pe + \ 


( = ] 0 . 


where (a) follows from (fl6l) . ■ 

We proceed to the proof of Theorem [5] 

Proof of Theorem® Substituting z — 0 into (flOl) yields p e + h e ( 0) = h e ( 1). It can be 
shown that if h e (z) solves (flOl) . then any function of the form h t (z) + constant also solves this 
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equation. Therefore, we can fix h e (0) = 0, which implies that h e ( 1) = p e . It then follows that 
the DP operator with the function h*(z) is: 

(Th*)(z) — sup tH b (8) + e(l - 8) ^ b ^ Pe ) + eh*(l - (5). 

0<<5<2 Pe + -= 

Now, the term h*( 1 — 5) is calculated for two cases: 


1 H b ( Pt ) 


(17) 


pe+ , ^ 1 ~5>P*. 

To complete the proof, we have three cases for calculating the operator ( Th*)(z ): 

• For 0 < z < p e , the constraint 0 < 8 < z implies that 0 < 8 < p e , and from ([T71) . we have 
h*(l — 8) = IIb{p ^ . Let us show that (flOl) is satisfied: 

(rpi *\/ \ — tt / <r\ . — /'t ?\ R b{jPe) . HbiPe) 

( Th e ){z) = sup eH b {8) + e(l - 8)--y + e- - T 

n<s<z V, + = V,+ ~ 


-tt m S- R b{Pe) . H b (p e ) 

= sup eH b {8) - 8e -—p H-—p 

0<S<z Pe + J Pe + | 

W -tt f x - H b(Pe ) . H b (p e ) 

= eH b {z - ze —p + —-- 

Pe + J Pe + J 

- h*(~\ _l_ n* 


where (a) follows from Lemma [6J 

For p e < z < 1 —pe, the same calculation as for the previous interval shows that h*(l —8) = 
Hb ^ for all 8 E [0,1 — p e ]. Let us show that (flOl) is satisfied: 


Ve + J 


\ -tt (r\ , -/-i r\ R b{Pe) H b (p e ) 

(Th )(s) = sup eH b {8) + e(l - 8) - r + e- r 

0<5<2 Pe + f Pe + J 

- IT fS\ x- H b(Pe) , H b ( Pe ) 

= sup eH b {8) - 8e - r H-p 

0<<5<2 Pe + ■= Pe + J 

(<*) -tt t \ - H b{Pe) . H b (pe) 

= eH b (pe) - p e e -— + 


H b (p, 


— — + i 

Pe+\ Pe + \ 

= K( Z ) + 


Pe + 7 

H b (p e 


Pe + - 


where (a) follows from Lemma [6j 
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For 1 — p e < z < 1, the function h* (1 — 5) can have different terms, and therefore, we 
separate: 

(rpi \ ( -TT /c\ I -/-I c\Hb{Pe) Hb(p e ) 

(Th )(£) = max sup eH b (o) + e(l — o) -t- + e- 

V 0<5<1- Pe Pe + ■= Pe + j' 

sup eH b (6) + e(l - 5)^4 + e[eH b {5) - (1 - 8)e^4 

1 -Pe<s<z Pe + J Pe + J 


(a) 

= max 


(H b (p e ) H b (p e ) x\ Hb (Pt) 

- ~T + -sup e(l + e)H b {5) + ee(l — 8) -- T 

\ Pe+J Pe+J 1 ~ Pe < S<z Pe + J 


(J r> H b (p e 


Pe+J 

= K(z)+ P :, 

where (a) follows from Lemma [6l and (6) follows from 
sup e(l + e)H b (5) + ee(l — 5)—^~4 < sup e(l + e)H b (8) + sup ee(l — 5 )— 

l—p e <8<z Pe H“ 1— p e <5<2 l—p € <S<.z Pe “1“ 


= e(l + e)H b ( 1 - p e ) + ee(l - (1 - p £ )) 

#&(Pe) ro _ , , . n 

- -r 1 2e Pe + 1 + ej 


H b (Pe) 
Pe + 7 


< 2 


Pe + J 
Hb(Pe ) 


Pc+ 7 


Appendix C 

Accurate rate analysis 

The rate analysis in Section ED was simplified by assuming that each transmitted bit is Ber(p). 
Here, we show precisely that our coding scheme can be arbitrary close to Cf. The idea is to 
separate the coding scheme into two parts using a parameter A, which is a fixed constant. First, 
we use the coding scheme from Section IVI-BI to transmit a large number, nR — A, of message 
bits, while a different coding scheme will be used to transmit the remaining A bits. We show that 
the rate of the overall scheme is essentially determined by the rate of the first coding scheme. 
The next lemma will be used for the rate analysis of the first coding scheme, 

Lemma 7. Each transmitted bit, X t , can be chosen to be distributed as Be rip — e,), where 
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Proof: Assume that at time i, a procedure begins and its corresponding set of possible 
messages is Mi- 1 . According to L x , the number of messages that are labelled 'V is [j>|-Adj_i|J, 
where [-J is the floor operator. The resulting input distribution is A", r\j Ber (i^fer)’ which 
can be written also as X t ~ Ber(p — e*) since p — i^ 1 , < < p. 

In case of erasure at time i, recall that the number of messages that were labelled 'O' in L, is 
greater than the number of messages labelled '1', and thus, we are able to construct the labelling 
L 2 as follows; \jp\Mi-\ |J messages that were labelled 'O' at the previous transmission are flipped 
to '1', and all the remaining messages are labelled 'O'. It is clear that the input distribution is 
preserved in this case, and upon consecutive erasures, R and L 2 are being exchanged and the 
input distribution is not changed. Note that the choices of labelling are made in advance and 
both encoder and decoder agree on current labelling. ■ 

The encoding procedure occurs repeatedly and is over when the set of possible messages 
is less or equal than 2 A . Denote by ei,e 2 , ...,efc the correction factors for the k successful 
transmissions until the scheme is over. Following the same derivations in Section ED it follows 
that the rate is R— - ,^ i=1 ■ 

For the A remaining bits, we perform a code where a bit of message is followed by zero and 
this pair is transmitted repeatedly until a successful transmission. Thus, to send the message 
bit 'O', the pair '00' is repeated until '00' or '0?' are received, and to send the message bit 'V, 
the bits '10' are repeatedly transmitted until a 'V is received. The decoding for this scheme is 
straightforward, and calculation of the rate gives that R = 

To summarize, the average rate for the overall coding scheme is 


R 


/nR- A A 
V nR ) 


R + 



R. 


Consider the next lower bound on R, 

/ nfl-A A Eti fMj Z e ‘> . (JP\ 1 ~ 6 

'v /i(A + p ) - Ell e ‘ : nR ’ 2 

/nR — A A k rnirq HRp — e,;) / A A 1 —e 

— \ nR J + p) — kminiei \nR J 2 

(“) /nR — AA HRp - 2~ x ) / A A 1 - e 

V nR J j^+p \nRj 2 ’ 

where (a) follows from Lemma [7] namely, e« G [0, 2 _A ) for i — 1, ..., k. 
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Letting n —* oo, we see that R* = Hb(j [ 2 —- is achievable. Thus, by choosing A to be arbitrarily 

l-e+P 

large (but still finite), we can make R* arbitrarily close to the capacity Cf. 
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